Knowledge Injection via XAI: Predicting OOD Robustness

Parameter-Efficient Fine-Tuning with LoRA Adapters on DINOv2

Research Question: Can Explainability (XAI) metrics computed on clean images predict model robustness under Out-of-Distribution (OOD) corruptions?

Hypothesis: Attention-based XAI metrics (Entropy, Deletion Score) extracted from clean images serve as reliable early indicators of model robustness under distribution shift.


Table of Contents

  1. Configuration and Data Loading
  2. Medallion Architecture Pipeline
    • Bronze Layer: Distributed Feature Extraction
    • Silver Layer: XAI Metrics Computation
    • OOD Layer: Corruption-Based Robustness Testing
    • Gold Layer: Correlation Analysis and Meta-Learner
  3. Adapter Zoo: LoRA Training
  4. XAI Metrics Framework
  5. OOD Corruption Strategy
  6. Robustness Analysis
  7. XAI-Robustness Correlations
  8. Meta-Learner Performance
  9. Feature Importance Analysis
  10. Conclusions
Loaded 10 Gold Layer datasets
Training metrics: 3 adapter(s)
Available datasets: ['correlations', 'classifier_comparison', 'feature_importance', 'adapter_summary', 'degradation', 'qualitative_summary', 'quantitative_summary', 'xai_feature_ranking', 'adapter_ranking', 'worst_corruption']

2. Medallion Architecture Pipeline

The experimental pipeline follows a Medallion Architecture (Bronze - Silver - Gold) with a dedicated OOD evaluation layer, implemented using Apache Spark for distributed processing.

2.1 Bronze Layer: Distributed Feature Extraction

Purpose: Extract embeddings from raw images using DINOv2 backbone with Spark Pandas UDFs.

Component Description
Backbone facebook/dinov2-base (ViT-B/14, 86M parameters)
Optimization FlashAttention + SDPA, FP16 inference
Framework PySpark Pandas UDFs for distributed execution
Output CLS token (768-dim) + Patch tokens (256 x 768)
# Key implementation (bronze_layer.py)
model = AutoModel.from_pretrained(
    "facebook/dinov2-base",
    torch_dtype=torch.float16,
    attn_implementation="sdpa"  # Scaled Dot-Product Attention
)

2.2 Silver Layer: Distributed XAI Extraction

Purpose: Apply LoRA adapters and compute explainability metrics on clean images.

Metric Formula Interpretation
Attention Entropy $H = -\sum_i p_i \log_2(p_i)$ (normalized) Focus metric: High = dispersed attention
Sparsity Gini coefficient on attention weights Concentration: High = focused attention
Deletion Score AUC of confidence when removing important patches Faithfulness (RISE): Lower = meaningful attention
Insertion Score AUC of confidence when adding important patches Faithfulness: Higher = meaningful attention

2.3 OOD Layer: Corruption-Based Robustness Testing

Purpose: Evaluate adapter robustness under controlled image corruptions.

Corruption Severity Levels Parameters
Gaussian Noise shallow, medium, heavy $\sigma \in \{15, 40, 80\}$
Blur shallow, medium, heavy radius $\in \{1.0, 3.0, 6.0\}$
Contrast shallow, medium, heavy factor $\in \{0.7, 0.4, 0.15\}$

Output: Binary is_correct label per (image, adapter, corruption) tuple.

2.4 Gold Layer: Correlation Analysis and Meta-Learner

Purpose: Validate hypothesis and train Meta-Learner to predict robustness from XAI metrics.

Analysis Method
Correlation Pearson, Spearman, Point-Biserial
Effect Size Cohen's d, Separation Ratio
Meta-Learner XGBoost, RandomForest, LogisticRegression
Validation 5-Fold Stratified CV, Permutation Importance

3. Adapter Zoo: LoRA Training

Parameter-Efficient Fine-Tuning (PEFT) with LoRA

The Adapter Zoo contains three Low-Rank Adaptation (LoRA) adapters with varying capacities, trained on the DINOv2-base backbone.

LoRA Configuration:

  • Technique: DoRA (Weight-Decomposed LoRA) + RsLoRA (Rank-Stabilized)
  • Alpha Scaling: $\alpha = 2 \times r$ (scaling factor)
  • Target Modules: query, value, fc1, fc2 (attention + MLP layers)
  • Dropout: 0.1

Training Hyperparameters:

  • Optimizer: AdamW with learning rate $3 \times 10^{-4}$
  • Epochs: 15 with gradient accumulation (factor 2)
  • Batch Size: 16 (effective 32 with accumulation)
  • Regularization: Dropout = 0.1, DoRA + RsLoRA enabled
  • Target Modules: query, value, fc1, fc2

Data Augmentation:

  • Random rotation (30 degrees)
  • Horizontal flip (p=0.5)
  • Color jitter (brightness/contrast 0.2)
  • Random crop (224x224 from 256x256)
LoRA Adapter Training Results
================================================================================
  Rank Alpha Trainable Params Trainable (%) Train Loss Eval Loss Accuracy F1 Score Precision Recall Duration (min)
0 4 8 702K 0.80% 0.251 0.119 96.06% 96.04% 96.49% 96.06% 37.4
1 16 32 2.25M 2.53% 0.307 0.121 96.74% 96.72% 96.88% 96.74% 36.8
2 32 64 4.31M 4.75% 0.477 0.169 95.38% 95.37% 95.60% 95.38% 38.1

4. XAI Metrics Framework

The XAI framework computes four complementary metrics from attention maps to assess model interpretability and predict robustness.

Metric Definitions

1. Attention Entropy (normalized Shannon entropy)

$$H = -\frac{\sum_{i=1}^{N} p_i \log_2(p_i)}{\log_2(N)}$$

Where $p_i$ is the attention weight for patch $i$, normalized to $[0, 1]$.

  • High entropy = Dispersed, unfocused attention
  • Low entropy = Concentrated, focused attention

2. Sparsity (Gini Coefficient)

$$S = 1 - \frac{2}{N} \sum_{i=1}^{N} (N - i + 0.5) \cdot p_{(i)}$$

Where $p_{(i)}$ are sorted attention weights.

  • High sparsity = Attention concentrated on few patches
  • Low sparsity = Attention distributed across many patches

3. Deletion Score (Faithfulness Metric from RISE)

Progressively remove patches in order of importance (highest attention first) and measure AUC of confidence drop:

$$\text{Deletion} = \text{AUC}\left(\frac{f(\text{masked})}{f(\text{original})}\right)$$
  • Lower score = Attention correctly identifies important regions

4. Insertion Score (Faithfulness Metric)

Progressively reveal patches starting from blank image and measure AUC of confidence recovery:

$$\text{Insertion} = \text{AUC}(f(\text{revealed}))$$
  • Higher score = Attention correctly identifies important regions

5. OOD Corruption Strategy

Corruption Types and Severity Levels

The OOD Layer applies three corruption types at three severity levels to stress-test adapter robustness under distribution shift.

Each corruption simulates real-world image degradation scenarios:

  • Gaussian Noise: Sensor noise, low-light conditions
  • Blur: Motion blur, defocus
  • Contrast: Lighting variations, exposure issues
OOD Corruption Configuration
============================================================
  Corruption Level Parameter Value Expected Impact
0 Gaussian Noise shallow sigma 15.000000 Low
1 Gaussian Noise medium sigma 40.000000 Medium
2 Gaussian Noise heavy sigma 80.000000 High
3 Blur shallow radius 1.000000 Low
4 Blur medium radius 3.000000 Medium
5 Blur heavy radius 6.000000 High
6 Contrast shallow factor 0.700000 Low
7 Contrast medium factor 0.400000 Medium
8 Contrast heavy factor 0.150000 High

6. Robustness Analysis

6.1 Adapter Performance on OOD Data

How do the adapters from the Adapter Zoo perform under corrupted images? Lower-rank adapters are expected to generalize better due to implicit regularization.

Adapter Zoo: OOD Performance Summary
================================================================================
adapter_rank accuracy mean_entropy mean_sparsity mean_deletion mean_insertion std_entropy n_samples
1 16 0.8920 0.7417 0.7343 0.4808 0.8895 0.0400 33120
2 32 0.7588 0.7908 0.7212 0.4472 0.8676 0.0413 33120
0 4 0.9501 0.6752 0.7659 0.4834 0.8981 0.0356 33120
Key Findings:
  - Best OOD robustness:  Rank 4 (95.0%)
  - Worst OOD robustness: Rank 32 (75.9%)
  - Performance gap: 19.1 percentage points

6.2 Accuracy Degradation by Corruption

How much does accuracy drop from shallow to heavy corruption for each type?

Degradation Statistics
============================================================
Max drop:     81.1%
Min drop:     0.4%
Mean drop:    30.5%

Worst case:   Rank 32 + blur (81.1% drop)

6.3 XAI Metrics Distribution by Adapter


7. XAI-Robustness Correlations

Core Research Question: Do XAI metrics on clean images predict failures on corrupted images?

Statistical Measures

Metric Description Interpretation
Pearson r Linear correlation Direction and strength of linear relationship
Spearman r Rank correlation Monotonic relationship (robust to outliers)
Cohen's d Effect size Practical significance: small (< 0.2), medium (0.2-0.8), large (> 0.8)
Separation Ratio Mean difference / pooled std Discriminability between correct and wrong predictions
XAI Feature Correlations with OOD Robustness
================================================================================
  feature pearson_r spearman_r cohens_d separation_ratio mean_correct mean_wrong
0 entropy -0.166600 -0.170900 -0.497600 0.255500 0.731900 0.762000
1 sparsity 0.031300 0.031100 0.092300 0.046100 0.741300 0.735400
2 deletion_score 0.111600 0.111300 0.330600 0.165000 0.478600 0.417300
3 insertion_score 0.183100 0.153000 0.548200 0.230900 0.891600 0.842400
Interpretation Guide:
  - Negative r (entropy): Higher entropy = LESS robust
  - Positive r (insertion): Higher score = MORE robust
  - Cohen's d > 0.5: Medium effect size (meaningful difference)

8. Meta-Learner Performance

Meta-Learner Design

The meta-learner predicts whether a sample will be correctly classified under corruption, using only XAI features from clean images.

Training Configuration:

  • Input: 4 XAI features (entropy, sparsity, deletion_score, insertion_score)
  • Target: Binary label (is_correct under corruption)
  • Split: 80% train / 20% test, stratified
  • Scaling: StandardScaler on features
  • Validation: 5-Fold Stratified Cross-Validation

Models Compared:

Model Key Hyperparameters
RandomForest n_estimators=200, max_depth=12, balanced class weights
XGBoost n_estimators=200, max_depth=6, L1=0.1, L2=1.0
XGBoost_Tuned n_estimators=300, max_depth=4, L1=0.5, L2=2.0, gamma=0.1
LogisticRegression C=0.1 (strong L2), balanced class weights
Meta-Learner Performance Comparison
================================================================================
  model accuracy roc_auc f1 precision recall cv_auc_mean cv_auc_std
0 RandomForest 67.38% 0.719 0.783 92.41% 67.96% 0.710 0.006
1 XGBoost 63.94% 0.731 0.751 93.53% 62.75% 0.721 0.006
2 XGBoost_Tuned 63.96% 0.739 0.750 93.90% 62.49% 0.729 0.006
3 LogisticRegression 65.69% 0.732 0.767 93.33% 65.07% 0.725 0.008
Best Model: XGBoost_Tuned (ROC-AUC: 0.739)

9. Feature Importance Analysis

Which XAI metrics contribute most to robustness prediction?

Feature Importance Ranking
================================================================================
feature xgb_importance rf_importance perm_importance perm_std lr_coef lr_odds_ratio
0 entropy 0.4247 0.3588 0.0654 0.0022 -0.9629 0.3818
1 sparsity 0.1450 0.1691 0.0336 0.0018 -0.5455 0.5795
2 insertion_score 0.2519 0.2669 0.0201 0.0019 0.4363 1.5470
3 deletion_score 0.1784 0.2053 0.0136 0.0018 0.2085 1.2318
Importance Measures:
  - XGB Importance: Gain-based importance from XGBoost
  - Permutation: Drop in accuracy when feature is shuffled
  - LR Odds Ratio: exp(coefficient) from LogisticRegression

10. Summary and Conclusions

10.1 Qualitative and Quantitative Summary

QUALITATIVE SUMMARY
============================================================
metric value
0 Best XAI predictor insertion_score
1 Highest correlation 0.183
2 Best effect size (Cohen's d) 0.548
3 Best meta-learner XGBoost_Tuned
4 Meta-learner AUC 0.739
QUANTITATIVE SUMMARY
============================================================
metric value
0 Adapters tested 3
1 Corruption types 3
2 Max accuracy drop (%) 81.07
3 Avg accuracy drop (%) 30.54
WORST CORRUPTION PER ADAPTER
============================================================
adapter_rank worst_corruption max_drop_pct
0 16 blur 51.162155
1 32 blur 81.071429
2 4 blur 19.521584
================================================================================
                            KEY RESEARCH FINDINGS
================================================================================

1. XAI METRICS PREDICT ROBUSTNESS

   * Entropy (r=-0.17): Higher entropy indicates less robust predictions
   * Insertion Score (r=+0.18): Best positive predictor of robustness
   * Cohen's d up to 0.55: Medium effect size confirms practical significance

2. ADAPTER RANK MATTERS

   * Rank 4:  Best OOD robustness (~95%) despite fewer parameters
   * Rank 32: Worst OOD robustness (~76%) - evidence of OVERFITTING
   * Conclusion: Lower rank = better generalization to corrupted data

3. CORRUPTION IMPACT VARIES SIGNIFICANTLY

   * Blur:     Most devastating (up to 81% accuracy drop at heavy level)
   * Gaussian: Moderate impact (~41% drop at heavy level)
   * Contrast: Minimal impact (<1% drop even at heavy level)

4. META-LEARNER ACHIEVES PREDICTIVE POWER

   * XGBoost ROC-AUC: ~0.74 (predicting failures from clean-image XAI metrics)
   * Validation: Hypothesis CONFIRMED - XAI metrics can predict OOD robustness

================================================================================

10.2 Conclusions

Hypothesis Validation: CONFIRMED

  1. XAI metrics computed on clean images CAN predict OOD robustness (ROC-AUC ~0.74)

  2. Entropy is the most informative metric - models with higher attention entropy are less robust under corruption

  3. Lower LoRA rank generalizes better - Rank 4 outperforms Rank 32 on corrupted data despite having 6x fewer parameters

  4. Blur is the most challenging corruption - up to 81% accuracy drop, while contrast changes are almost harmless

  5. Practical application: Use XAI metrics as early warning system for robustness issues before deployment


Future Work

  • Extend to additional corruption types (JPEG compression, weather effects)
  • Test on other vision backbones (ConvNeXt, CLIP)
  • Investigate per-class robustness patterns
  • Deploy meta-learner as real-time monitoring tool